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Abstract 

Six methods for assembling tests from a pool with an item-set structure are presented. AH 
methods are computational and based on the technique of mixed integer programming. The 
methods are evaluated using such criteria as the feasibility of their linear programming 
problems and their expected solution times. The methods are illustrated for two item pools 
with a set structure from the Law School Admission Test (LSAT). 
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Optimal Assembly of Tests with Item Sets 

A well-known format in achievement testing is the one of a test with sets'of items 
related to a common stimulus. The format has been ubiquitous in testing of reading 
comprehension where examinees are typically offered a series of text passages each 
followed by a set of questions on them. Other examples can be found in testing of 
achievements in science when sets of items relate to a description of a common data set or 
experiment, or in law exams with sets of questions addressing a common lawsuit. The use 
of tests with an item-set structure has become popular lately as a result of the trend to 
making testing more performance based. 

Assembling tests from an item pool with a set structure tends to be much more 
complicated than from a pool of self-contained items, mainly because they have to obey 
more complicated lists of specifications. For example, specifications for test with item sets 
do not only involve constraints on item and test attributes but also on stimulus attributes 
as well as on distributions of item attributes in items sets. In addition, this type of test 
assembly has to meet the following logical or Boolean constraints: 

(1) if any of the items in a set is selected, its stimulus is selected; 

(2) if any of the items in a set is selected, a minimum and/or maximum number 
of the items in the set is selected. 

This. paper presents a number of methods for assembling tests from pools with 
items sets. All methods are computational and based on the technique of mixed integer 
programming (LP). The technique will be briefly introduced in the description of the first 
method below. A more general introduction to LP-based test assembly and a review of its 
current applications are given in van der Linden (1998). 

It is assumed that test assembly is IRT based, that is, its objective is to assemble a 
test with an information function that has to meet a given target (Bimbaum, 1968). In the 
empirical examples in this paper, the 3-parameter logistic (3-PL) model is assumed to 
hold. The response function for item i in this model is given by: 
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’ 1 +exp[aj(0-bj)] ’ 
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( 1 ) 



where 06(-co,co) is the examinee parameter, bj6(-^,^) is the item difficulty. aie[0,co) is 
the item discrimination, and CjefO.!] a parameter needed to deal with guessing on the 
Item . The question if pools with item sets are likely to fit the model in Equation 1 is 
deliberately omitted here (for this question, see Rosenbaum, 1987). 

The paper is organized as follows. First, the various types of constraints on item 
selection possible in test assembly with item sets are described. Then six different methods 
and the.r associated mixed integer programing models for test assembly subject to such 
constraints are introduced. The methods are evaluated using such criteria as the feasibility 
of their LP problem and their expected solution times. The final section of the paper 
presents some empirical examples in which the results for these methods are compared for 
two item pools from the Law School Admission Test (LSAT). 

Constraints on Tests Assembly with Item Sets 
Specifications for tests with item sets typically address attributes defined at three 
different levels in the test (individual items; sets; complete test). In addition, they imply 
Item-selection constraints on attributes at their primary level but often also at higher levels 

of aggregation. As an example of the distinction between attribute level and constraint 
level, consider the following specification: 

"No item set in the test should have more than two items with a multiple- 
choice format." 

This specificaiion addresses an attribute defined at item ievel ("response fomiaf) but 

invoives a constrain, on this attribute a, the level of the item sets ("no more than two 
multiple-choice items per set"). 

The following classifications of attribute and constraint level are used to formulate 
the test assembly methods later in this paper: 
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Attribute Level . 

Three different attribute levels are distinguished. At each of these levels several 
types of attributes can be met. However, in practice some types of attributes are more 
likely to occur at certain levels than others. The attribute levels addressed in this paper 
are: 

1. Item level . Examples of item attributes are: content, cognitive level, values 
of statistical parameters, format, and word counts. Some of these attributes 
are categorical, that is, imply a partition of the item pool with each class 
representing a categorical value of the attribute (e.g., "response format" with 
values "multiple-choice" and "constructed response"); others are quantitative 
(e.g., item p-values). Both types of attributes lead to different types of 
constraints (for examples, see Equations 5-8 below). 

2. Stimulus level . Stimuli can have the same kind of categorical attributes as 
items (content; cognitive level; etc). However, except for an attribute as 
word counts, they are unlikely to have quantitative attributes associated with 
them. In particular, they seldom have statistical attributes. 

3. Test level . Attributes can also be defined at the level of the complete test. 
Examples are: test length, maximum distance between the lest information 
function and a target, and (classical) reliability of the test. Attributes at this 
level are generally quantitative and statistical by nature. 

Constraint Levels 

Constraints at four different levels are distinguished. Each constraint level 
addresses attributes defined at the same level or aggregates of allribules defined at a lower 
level in the lest. The levels considered in this paper are: 

1. Item level . Constraints at item level generally stipulate the inclusion or 
exclusion of items with certain allribule values from the lest. Example of 
constraints formulated at item level are: 
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’’Each items should be on analytic reasoning”; 

"No completion items should be used”. 

Item-set level . Constrains at item-set level control the distribution of the 
values of categorical item attributes, require a function of the values of 
quantitative item attributes to be between bounds, or require the 
simultaneous occurrence of certain item and stimulus attributes. Examples 
of these types of constraints are: 

"Each item set should have at least two items on 
applications”; 

"The average p-value of the first item set should not be 
smaller than .60; 

"Item sets with a stimulus describing a physics experiment 
should have no more than two items with graphical 
information.” 

Stimulus level . Just as at item level, constraints at stimulus level govern the 
inclusion or exclusion of stimuli with certain attribute values from the test. 
An example of a constraint at stimulus level is: 

"No stimulus with should have a word count larger than 350 
words”. 

Test level . Constraints at test level apply either to test attributes or to 
distributions or functions of values of item or stimulus attributes. Examples 
of constraints at this level are: 

"The test should have three stimuli presenting a recent 
newspaper article”; 

"The test information function should be uniform over the 
interval from 0=-2.Oand 1.5". 



As already noted, the above classification of constraint levels implies a hierarchical 
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structure with respect to the attribute levels. Each constraint is formulated at the same 
level as the attributes it addresses or higher, but never lower. In fact, attributes themselves 
may have a hierarchical structure too, in particular if they are quantitative. Examples of 
such attributes are test information function and the classical reliability coefficient; both 
are defined as mathematical functions of lower-level attributes (item information, p-values 
and covariances between items). 

Finally, it is observed that the two classifications may have to be extended with an 
intermediate level when the test has subtests or sections. Likewise, higher levels may have 
to be added, for example, when a set of parallel test forms or a set of tests for use in 
multi-stage testing is assembled. 

Methods of Test Assembly 

Six different methods for assembling tests with item sets are presented. Some of 
these methods are exact; others require manual preprocessing of the item pool or have a 
heuristic element. The features of these methods will be evaluated against each other after 
the methods have been described. 

Method 1: Simultaneous Selection of Items and Sets 

The key feature of this method is that separate decision variables for the selection 
of items and stimuli are defined. The variables are used to model the constraints to be 
imposed on the selection of items and stimuli. Special constraints are added to keep the 
selection of items and stimuli consistent, that is, prevent that items (stimuli) are selected 
but their stimuli (items) are not. This first method was introduced in van der Linden 
(1992). 

Let the stimuli in the pool be indexed by s=l,...,S, and the items nested under 
stimulus s by 1^=1,...,!^. Variables are used to select the stimuli; they take the value 1 if 
stimuli s is selected for the test and the value 0 otherwise. Likewise, 0-1 variables Xj^ are 
defined for the decision on item i^. 

It is assumed that target values, T(0j^), k=l,...,K, are specified for the value of the 
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test information function at 0j^. The value of the information for item i^ at is denoted 
as Ij (0j^). In the model below, for each value 0j^ the test information function is required 
to be in the interval (T(0|^)-y, T(0j^)+y), where y is a (real-valued) variable defining the 
size of the interval. The objective of the decision problem is to minimize y. For a more 
extensive description of this minimax objective, see van der Linden and Boekkooi- 
Timminga (1989). 

In addition, the following notation is needed: 

qj^: value of item i^ on quantitative attribute q; 

r^: value of stimulus s on quantitative attribute r; 

C^: set of indices of items with value g on categorical attribute C, g=l,...,G; 

O 

Dj^: set of indices of stimuli with value h on categorical attribute D, h=l,...,H; 

n: number of items in the test. 

The following model for simultaneous selection of items and stimuli is presented: 

minimize y (2) 



(test information function) (3) 
(test information function) (4) 
(quantitative item attribute) (5) 
(quantitative item attribute) (6) 



subject to 



S ^s 



I S li (0|^)Xi + y > T(0|^), k=l,...,K 



s=l.s=l " " 

S h 

S S Ij (0]^)Xi - y < T(0]^), . k=l,...,K 

s=l.s=l " " 

S h 



( 1 ) 
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(categorical item attribute) (7) 
(categorical item attribute) (8) 
(quantitative stimulus attribute) (9) 
(quantitative stimulus attribute) (10) 
(categorical stimulus attribute) (11) 
(categorical stimulus attribute) (12) 
(number of item sets) (13) 
(number of items per set) (14) 
(number of items per set) (15) 
(test length) (16) 
(test length) (17) 
(definition of decision variable) (18) 
(definition of decision variables) (19) 
(definition of decision variables) (20) 
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The constraints in Equations 3 and 4 tighten the values of the test information function to 
the common interval (T(0j^)-y, T(0j^)+y). The size of the interval is minimized in 
Equation 2. Equations 5-8 show how sums of values of quantitative attributes or 
distributions of items across values of categorical attributes can be constrained to meet 
lower (subscript "1") and upper bounds (subscript "u"). The same is demonstrated for 
quantitative and categorical stimuli in Equations 9-12. For convenience, examples are 
given for constraints at test level only. Other constraint levels in the earlier classification 
can be realized adapting the sums in the equations. 

The number of item sets to be selected is set in the constraint in Equation 13. 
Equations 14 and 15 have a double purpose. On the one hand, they constrain the numbers 
of items per stimulus, where n^^^ and n^^^ are the lower and upper bounds on the number 
of items in set s, respectively. On the other hand, as can easily verified by substituting a 0 
and 1 for z^, the constraints coordinate the selection of items and stimuli. These 
constraints are the logical or Boolean constraints needed for test assembly with item sets 
alluded to earlier. The total number of items in the test is set through Equations 16 and 
17. 

Equations 18-20 constrain the decision variables to their proper domains of 
possible values. Observe that y is a decision variable too. Due to its presence, the problem 
involved in solving Equations 2-20 is known as a mixed integer programming problem. 
General LP software (e.g. CPLEX; see ILOG, 1998) or one of the algorithms in the test 
assembly software package ConTEST (Timminga, van der Linden, & Schweizer, 1996) 
can be used to solve the model for optimal values for the decision variables. Numerical 
aspects of solving models as in Equations 2-20 will be discussed further below. 

Method 2: Simultaneous Selection with Pivot Items 

In mixed integer programming, solution times generally depend on the numbers of 
variables in the model. It is therefore advantageous to find models for test assembly 
problems with item sets that are based on fewer variables with results that closely 
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approximate those for a full simultaneous approach. 

A reduction of the number of variables is possible by assigning one item in each 
item set in the pool the special status of "pivot item". Formally, a pivot item is defined as 
an .tern selected for the test IL and only if its stimulus is selected for the test. In practice, 
test specialists can be asked to select as pivot items the ones they feel represent their 
stimuli best and would be their first option if the test were to be assembled by hand. Of 
course, what is "best" should follow from the specifications for the test in combination 
with the relative scarcity of the item attributes in the pool. 

Because the decision variables for pivot items and stimuli have identical values in 
any solution, the decision variables for the pivot items can be used as carriers for the 
attributes of the stimuli and to formulate constraints on stimulus selection. Hence, if p.vot 
items have been selected, no separate decision variables for the stimuli are needed. 

Let 1 ^ be the index value of the pivot item for stimulus s. The only thing needed 
to change the model in Equations 2-20 into a model for Method 2 is: 

1. Substitution of decision variables x ^ for decision variables z . 

i s 

2. Omission of the constraints in Equation 20. 

Observe that the constraints in Equations 14 and 15 now guarantee that pivot items are 
selected any time a sufficient number of items for their stimulus is. These constraints thus 
provide the formal definition of the status of the pivot items. 

Method 3: All Items Per Set .SelprfpH 

In the previous method, the number of decision variables was reduced by removing 
the vanables for the stimuli from the model. A more dramatic reduction is possible if the 
decision variables for the items can be removed. This possibility arises if the numbers of 
Items per stimulus in the pool meet the specifications for the test, for example, when the 
pool has to serve only one testing program and the item sets in the pool have been tailored 
to the specifications for this test. Another application anses if all items sets are edited by 
test specialists prior to the test assembly process removing the worst items from the sets 

ERIC 1 3 
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until their size meets the specifications. / 

In either application, the only decisions left are which stimuli to select for the test. 
As all items in the sets are selected along with their stimulus, aggregated values of the 
item attributes in the sets can be assigned as attributes to the stimuli, and the decision 
variables for the stimuli can be used to formulate constraints on the item attributes. 

In the model in Equations 2-20, item attributes were used in Equations 2 and 3 
(information function values). Equations 6 and 7 (quantitative attributes), and Equations 8 
and 10 (categorical attributes). Constraints on the selection of the items were also 
formulated in Equations 14-17. Let 

n^ = number of items in set s; (21) 



c = number of item s in set s with index in C ; 

o 



ls(0k) 




I. 



s(6k) 



( 22 ) 

(item set information) (23) 



qg = E qj (sum of values on quantitative attribute) (24) 

's = l ' 

The model for Method 3 is derived from the one in Equations 2-20 by making the 
following modifications: 

1. Equations 3-8 and 16-17 are reformulated as: 

S 

I Is(0k)Zs + y > T(6|^), k=l,...,K 
s = l 



(test information function) (25) 
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S Ij.(01^)Zs - y < T(0|^), k=l,...,K 
s = l 

(test information function) (26) 



(quantitative item attribute) (27) 
(quantitative item attribute) (28) 
(categorical item attribute) (29) 
(categorical item attribute) (30) 
(test length) (31) 
(test length) (32) 

2. The constraints on numbers of items per set in Equations 14 and 15 and the 
definition of the decision variables for the items in Equation 19 are 
removed from the model. 

Method 4: Decision Variables for Subsets (Power Set Approach) 

The following method was inspired by an observation in Swanson and Stocking 

(1993, p. 157). If the number of items in set s is equal to n^ the maximum number of 

"s 

(nonempty) different sets in the test selected from s is equal to 2 -1, that is, the number 

of elements in the power set of s minus the null set. Assembling the test can be modeled 

using separate decision variables for each subset and without any variable for the items, 
ns 

Let Zp^, p=l,...,2 , be the pth element in the power set of item set s=l,..,S. For 

each element in the power set, definitions for the numbers of items and quantitative and 




S = 1 
S = 1 

S 

Z CeryZ 



> n?, g=l,„G 



s = l 



Sg-S - "g 



Z Cg.Zg < g=L„G 



s = l 



..gs^s - "g 



Z ncZc-n^^^ > 0 



s=l 



‘s^s 



Z n^Zg-n^^^ < 0 
s = l 
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categorical item attributes as in Equations 21-24 are introduced. The model needed to 
implement a power set approach set is analogous to the one for the previous case. The 
only exceptions are: 

1. The addition of the following set of constraints to prevent selection of more 

than one subset per item set: 

n o 
2 ® 

Z Zp, <1, s=l,...,S (mutually exclusive subset selection) (33) 

p.i 

2. The replacement of the constraint on the number of item sets to be selected 
in Equation 13 by: 



S ^s 
S S 
s=l p=l 



"Ps 



m 



(number of item sets) (34) 



Observe that the constraint in Equation 34 works correctly only in combination with the 
ones in Equation 33. 

This method yields an optimal solution. However, its number of variables easily 
becomes large. In fact, the method is practical only when some of the item sets in the pool 
have one or two items too many. In all other cases. Method 1 is superior in the sense that 
it also produces an optimal result but has fewer variables. 

Method 5: Two-Stage Selection 

If a mathematical programming problem is too large, an obvious approach is to 
approximate the problem by a series smaller problems. This strategy is followed in 
Method 5 which is based on two stages: In Stage 1 item sets are selected, whereas in 
Stage 2 the test is assembled from the sets selected in Stage 1. 

The model for Stage 1 is identical to the one for Method 3, with the following 
modifications of the constraints in Equations 26-32: 

1. Use of the constraints with upper bounds on categorical item attributes and 
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test length in Equations 30 and 32 can postponed to Stage 2. However, the 
versions of these constraints with the lower bounds are kept to maximize 
the likelihood of a feasible result in Stage 2. 

2. Constraints on quantitative item attributes are rescaled at item level. For 

example, the constraints on test information in Equations 25 and 26 can be 
reformulated as: 

^ -1 1 

Z n Is(0k)Zs * y ^ ^ T(01^), k=l,...,K 

s = l ® 

(test information function) (35) 

^ -1 1 

Z n Is(0k)Zs - y ^ n T(01^), k=l,...,K 

s = l 

(test information function) (36) 

where n is the intended test length. This rescaling is necessary to maximize 
the likelihood of a good fit of the information function for the items 
selected in Stage 2. If test length is constrained by different upper and 
lower bounds, the mean of these two values can be chosen as the value of n 
in Equations 35 and 36. 

The model for Stage 2 is identical to the one for Method 1 in Equations 2-20, with 
the following modifications: 

1. The constraints on quantitative and categorical stimulus attributes in 
Equations 9-12 have already been realized at Stage 1 and are no longer 
needed, 

2. The constraints in Equations 14-15 are replaced by 



O 
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1 = 1 
Ic 



■'is " ”s“’. = 



i = l 



(number of items per set) (37) 
(number of items per set) (38) 



where s now runs over the item sets selected in Stage I. 

3^ The definhion of ,he decision variables ,n E,„a,,on 20 is no longer needed 
Method 6: Twn-Sf.a. Q.,.arsHn n lAltemativ. V.rcimnT 

The previous method has the advantage of a small number of dec.s.on variables but 
~ns ,he danger of a resnl, in S.age , iba, overcons, rains .be selecbon space in Stage 2 A 
po.en„a, nsefn, abe.anve ,o ,he prev.ons .e.hod ,s ,be„fore ,o selec. a larger nnnrber of 
Item se,s ,n S.age 1 ,han actually needed in Stage Z In fact. Stage 1 can be used Jus, to 
weed ou, item sets fortn tbe pool unlikely to be selected in S.age 2. 

Tbe tnodel needed for Stage 1 is identical to .be one for this stage in Method 5^ 

erence is the number of item sets selected in Equation 13. The model for 

Stage 2 ,s tdentical to the one of simultaneous selection of items and stimuli in Method I 

The model ,s now defined only over the pan of the pool selected in Stage I, 

Discussion 

Method 1 Is based on the mos, genentl fomtulation of the test assembly problem 
Its tmplementatlon does no, require any manual prepmcessing of the t.em pool. Also I, 
produces an optimal solution, provided the solution can he found in realistic time. 

Method 2 and 3 an, reductions of the original problem based on pmvtous 
assignment of pivot items in the sets and mducion of the siae of the item sets by weeding 
out thetr worn, items. However, the mduction ,n the number of vanables should be 
evaluated against the fact that the quality of the solution depends on the msul.s of the 
pmprocesstng of the item pool. ,f wrong selections am made a, this stage, the solution 
.hough optima, in the reduced problem, may subopttma, ,n the oHgina, pmblem. ' 

Method 4 ,s a generalization of Method 3 In the sense that it has decision variables 

o 

ERIC 
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associated not only with the item sets but also with each of their subsets. Like Method 1, 
Method 4 produces an optimal result to the original problem. However, since the number 
of variables in Method 4 increases dramatically as a function of the difference between the 
size of the item sets in the pool and the size requested for the test, it may have much 
more difficulty finding a solution in realistic time than for Method 1. 

The advantage of Method 5 and 6 is that they involve two small problems that can 
be solved quickly for item pools of a realistic size. Also, unlike Method 2 and 3, these 
methods do not involve any manual preprocessing of the item pool. A potential 
disadvantage of these methods is the possibility of a solution in Stage 1 that does not 
allow a feasible test at Stage 2. Method 6 is expected to perform generally better in this 
respect due to its less stringent selection in Stage 1. 

Empirical Examples 

The methods were applied to the problem of assembling the two sections of the 
LAST that have an item-set structure. The sections are coded here as SA and SB. (The 
LSAT has a third section that does not have item sets.) The numbers of items and stimuli 
in these two sections and their item pools are given in Table 1. 

[Table 1 about here] 

For both sections of the LSAT, models were formulated for Methods 1-3 and 5-6. 
For Method 2 and 3, LSAT specialists selected the pivot items and reduced the item sets 
in the pools to appropriate lengths. Method 6 was implemented by selecting twice as many 
items sets in Stage 1 as needed in Stage 2. The models dealt with such attributes as item 
and stimulus types (several levels), possible gender and minority orientation of item sets, 
answer key distributions of the items, and word counts of the stimuli. The numbers of 
variables and constraints in the models for these two sections are given in Table 2. It 

[Table 2 about here] 

reminded that the number of variables in Method 5 and 6 for Stage 2 depend on the items 
sets selected in Stage 1. 
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Method 4 was omitted because of its large number of decision variables. For 
example, for the SA pool a typical item set has 1 1 items whereas only 5-7 items per set 
are needed in the test. For this set only the number of variables would have been equal to 



all methods the models constrained the test information functions at 0=-1.8, -0.9, 0.0, 0.9, 
and 1.8. Solutions to the models were obtained using the branch-and-bound algorithm as 
implemented in CPLEX (ILOG, 1998) on a PC with Pentium Pro l66MHz processor. The 
algorithm was stopped as soon as the differences between the test information and target 
values were smaller than 3 % of the lowest target value. Since the lowest target value for 
SA was .8892 at 0=-1.8, the stopping criterion in this case was a maximum difference 
smaller than .08x.8892=.03. For SB, the smallest target value and stopping criterion were 
2.0796 and .06, respectively. Because the objective function in ^nation 2 is the largest 
difference between the test information function and target values over all 0 values, the 
stopping criterion could be applied directly to value of this function. 

Table 3 gives some technical results for these two series of examples. All methods 

[Table 3 about here] 

immediately produced feasible solutions for the two sections. The only exception was the 
combination of Method 6 and SB. In Stage 1, this method selected a combination of item 
sets that did not contain a feasible combination of sets for Stage 2. However, relaxing one 
of the constraints on the item sets, replacing ”=2" by "<3”, did produce a solution. The 
CPU times for all method were satisfactory. Methods 1 and 2 had the largest numbers of 
variables and were slowest. Surprisingly, the small reduction of the numbers of variables 
in Method 2 realized by introducing pivot items did not pay off in a smaller CPU time but 
the reduction in Method 4 had a dramatic effect. In fact, all methods based on a larger 
reduction of variables or a two-stage implementation of the selection procedure were very 
O 




Th e target information functions for SA and SB are shown in Figures 1 and 2. For 

[Figures 1-2 about here] 
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quick (generally less than 1 second of CPU time). The last column in Table 3 shows the 
values of the objective function for the solution, y . As already noticed, these values are 
equal to the largest difference between the test information function values and their target 
values across the 0 values used in the models. Methods 1 and 2 produced the best results, 
immediately followed by Method 6 for SB. The other methods produced larger 
differences. 

A graphical presentation of the results is offered in Figures 1-2. For SA and 
Method 1, 2 and 3 the test information functions were close to the target function. For SB 
the best results were obtained for Methods 1, 2 and 6; the test information functions for 
these methods were virtually indistinguishable from the target function. Also, Methods 3 
and 5, though not satisfactory, performed considerably better for SB than the two worst 
performing methods for SA. Observe that, both for SA and SB, Method 6, which 
constrains the item set selection in Stage 1 less stringently, did better indeed than Method 
5. 

Concluding Remark 

The empirical results in this paper are offered only as an example. Though most 
results were as expected, a surprise was the fact that Method 2 and 3 outperformed 
Method 5 and 6 for SA whereas the opposite tendency was observed for SB. These results 
show the dependency of the performance of test assembly methods on the composition of 
the item pool. When generalizing the results in these examples to other applications, this 
dependency should be taken into account. 
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Table 1 

Numbers of Items and Stimuli in Pools for SA and SB 



Section 


#Items 


Pool 

#Stimuli 


#Items/Stimulus 


Test 

#Items 


#Stimuli 


#Items/Stimulus 


SA 


208 


24 


5-11 


22-24 


4 


5-7 


SB 


240 


24 


8-12 


26-28 


4 


5-8 
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Table 2 

Numbers of Variables and Constraints in Models for 
Five Test Assembly Methods 



Method 


Section 


#Variables 


#Constraints 


1 


SA 


233 


91 




SB 


265 


109 


2 


SA 


209 


91 




SB 


241 


109 


3 


SA 


25 


41 




SB 


25 


60 


5 


SA- 1 ^) 


25 


29 




SA-2 


332) 


36 




SB -1 


25 


37 




SB-2 


3?2) 


58 


6 


SA -1 


25 


29 




SA-2 


742 ) 


57 




SB -1 


25 


37 




SB-2 


8 s 2 ) 


75 



Notes: 1. Second code indicates stage; 2, Number is dependent 
on output from Stage L 
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Table 3 

Technical Results for Five Test Assembly Methods 



Method 


Section 


Feasibility 


CPU Time 


* 

y 


1 


SA 


-1- 


4-5 mins 


.021 




SB 


-1- 


1-2 mins 


.011 


2 


SA 


-1- 


20 mins 


.032 




SB 


-1- 


90 mins 


.064 


3 


SA 


-1- 


<1 sec 


.473 




SB 


-1- 


<1 sec 


1.099 


5 


SA-1^^ 


-1- 


<1 sec 


.232 




SA-2 


-1- 


<1 sec 


1.801 




SB-1 


-1- 


<1 sec 


.432 




SB-2 


-1- 


<1 sec 


.881 


6 


SA-1 


-1- 


<1 sec 


.838 




SA-2 


-1- 


<1 sec 


1.339 




SB-1 


-1- 


<1 sec 


1.152 




SB-2 


.2) 


1-2 secs 


.049 



Notes: 1. Second code indicates stage; 2. CPU time and value of y were 

obtained after relaxation of one constraint to get a feasible solution. 
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Figure Captions 

Figure 1. Target information function and test information functions for Method 1, 2, 3, 5 
and 6 (Section SA). 

Figure 2. Target information function and test information functions for Method 1, 2, 3, 5 
and 6 (Section SB). 
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